STEP 1

Reading in the data.

datapath <- "/Users/eampo/Desktop/Stats_Final_Project/"

AssignmentData <- read.csv(file=paste(datapath,"regressionassignmentdata2014.csv",sep="/"), row.names=1,header=TRUE,sep=",")

head(AssignmentData)
##           USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR  Output1
## 1/5/1981   13.52  13.09  12.289   12.28  12.294   12.152   11.672 18.01553
## 1/6/1981   13.58  13.16  12.429   12.31  12.214   12.112   11.672 18.09140
## 1/7/1981   14.50  13.90  12.929   12.78  12.614   12.382   11.892 19.44731
## 1/8/1981   14.76  14.00  13.099   12.95  12.684   12.352   11.912 19.74851
## 1/9/1981   15.20  14.30  13.539   13.28  12.884   12.572   12.132 20.57204
## 1/12/1981  15.22  14.23  13.179   12.94  12.714   12.452   12.082 20.14218
##           Easing Tightening
## 1/5/1981      NA         NA
## 1/6/1981      NA         NA
## 1/7/1981      NA         NA
## 1/8/1981      NA         NA
## 1/9/1981      NA         NA
## 1/12/1981     NA         NA
tail(AssignmentData)
##           USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR
## 6/19/2014 0.0101 0.0456  0.4477  0.9255  1.6751   2.6206   3.4660
## 6/20/2014 0.0101 0.0355  0.4561  0.9256  1.6804   2.6052   3.4337
## 6/23/2014 0.0051 0.0355  0.4643  0.9470  1.7021   2.6261   3.4549
## 6/24/2014 0.0203 0.0507  0.4563  0.9311  1.6689   2.5781   3.3991
## 6/25/2014 0.0203 0.0456  0.4804  0.9124  1.6540   2.5592   3.3815
## 6/26/2014 0.0253 0.0507  0.4646  0.8803  1.6462   2.5214   3.3458
##             Output1 Easing Tightening
## 6/19/2014 -11.70638     NA         NA
## 6/20/2014 -11.71994     NA         NA
## 6/23/2014 -11.68766     NA         NA
## 6/24/2014 -11.73219     NA         NA
## 6/25/2014 -11.74946     NA         NA
## 6/26/2014 -11.79219     NA         NA

Analysis

After reading in the data, it looks like there are 7 different input variables. These input variables are daily records of the U.S. Treasury yields to maturity from 1/5/1981 to 6/26/2014. It also includes an unknown output variable and two other variables ‘Easing’ and ‘Tightening’.

Plotting the 7 input variables.

matplot(AssignmentData[,-c(8,9,10)],type='l')

Analysis

After plotting the 7 input variables, it looks like the overall trend is that U.S. treasury yields to maturity is decreasing. ‘Yield to Maturity’ is the internal rate of return (IRR) earned by an investor who buys the bond today at market price. This is assuming that the bond is held until maturity. Treasury yields have traditionally been thought of as a safe investment because it is backed by the U.S. government. The return on these bonds can function as economic indicators. If the yield is high, it can tell us that investors are confident that they can find other investments with better returns. In contrast, if yields are low, it can tell us that investors are being forced to settle for these safe investments. Treasury yields has an impact on in interest rates, particularly in real estate.

Plotting the output variable together with the input variable.

matplot(AssignmentData[,-c(9,10)],type='l')

Analysis

It looks like the output variable is higher than the input variables in early 1980s and has kept declining ever since. It starts to even out with the input variables later on in the 1980s before completely going below the input variables later on in the graph. It has stayed below the input variables ever since.

Box Plots for Input Variables

boxplot(AssignmentData$USGG3M, main="USGG3M")

boxplot(AssignmentData$USGG6M, main="USGG6M")

boxplot(AssignmentData$USGG2YR, main="USGG2YR")

boxplot(AssignmentData$USGG3YR, main="USGG3YR")

boxplot(AssignmentData$USGG5YR, main="USGG5YR")

boxplot(AssignmentData$USGG10YR, main="USGG10YR")

boxplot(AssignmentData$USGG30YR, main="USGG30YR")

Correlation between input variables against output variable.

cor1 <- cor(AssignmentData$USGG3M, AssignmentData$Output1)
cor2 <- cor(AssignmentData$USGG6M, AssignmentData$Output1)
cor3 <- cor(AssignmentData$USGG2YR, AssignmentData$Output1)
cor4 <- cor(AssignmentData$USGG3YR, AssignmentData$Output1)
cor5 <- cor(AssignmentData$USGG5YR, AssignmentData$Output1)
cor6 <- cor(AssignmentData$USGG10YR, AssignmentData$Output1)
cor7 <- cor(AssignmentData$USGG30YR, AssignmentData$Output1)

Input.correlation <- c(cor1, cor2, cor3, cor4, cor5, cor6, cor7)
Input.correlation.matrix <- matrix(data = Input.correlation, nrow = 7, byrow = FALSE)
rownames(Input.correlation.matrix) <- c("Input.1","Input.2","Input.3","Input.4","Input.5","Input.6","Input.7")
colnames(Input.correlation.matrix) <- c("Correlation")
Input.correlation.matrix
##         Correlation
## Input.1   0.9812265
## Input.2   0.9871111
## Input.3   0.9983139
## Input.4   0.9989602
## Input.5   0.9958368
## Input.6   0.9844737
## Input.7   0.9671263

Analysis

All input variables are highly correlated to the output variable with each correlation greater than .967. However, the greatest correlation is coming from Input 4, which is the USGG3YR variable.

STEP 2

Estimate simple regression model with each of the input variables and the output variable given

Input1.linear.Model <- lm(Output1~USGG3M, AssignmentData)
Input2.linear.Model <- lm(Output1~USGG6M, AssignmentData)
Input3.linear.Model <- lm(Output1~USGG2YR, AssignmentData)
Input4.linear.Model <- lm(Output1~USGG3YR, AssignmentData)
Input5.linear.Model <- lm(Output1~USGG5YR, AssignmentData)
Input6.linear.Model <- lm(Output1~USGG10YR, AssignmentData)
Input7.linear.Model <- lm(Output1~USGG30YR, AssignmentData)

Storing each linear model’s coefficients.

Coefficients.Input1.Matrix <- Slope.Intercept.Table<-t(sapply(1:7, function(x) lm(AssignmentData[,8]~AssignmentData[,x])$coefficient))

Coefficients.Input1.Matrix
##      (Intercept) AssignmentData[, x]
## [1,]   -11.72318            2.507561
## [2,]   -12.09753            2.497235
## [3,]   -13.05577            2.400449
## [4,]   -13.86162            2.455793
## [5,]   -15.43665            2.568742
## [6,]   -18.06337            2.786991
## [7,]   -21.08590            3.069561

Plot the output variable together with the fitted values.

matplot(AssignmentData[,8],type="l",xaxt="n",xlab = "Input1")
lines(Input1.linear.Model$fitted.values,col="red")

matplot(AssignmentData[,8],type="l",xaxt="n",xlab = "Input2")
lines(Input2.linear.Model$fitted.values,col="red")

matplot(AssignmentData[,8],type="l",xaxt="n",xlab = "Input3")
lines(Input3.linear.Model$fitted.values,col="red")

matplot(AssignmentData[,8],type="l",xaxt="n",xlab = "Input4")
lines(Input4.linear.Model$fitted.values,col="red")

matplot(AssignmentData[,8],type="l",xaxt="n",xlab = "Input5")
lines(Input5.linear.Model$fitted.values,col="red")

matplot(AssignmentData[,8],type="l",xaxt="n",xlab = "Input6")
lines(Input6.linear.Model$fitted.values,col="red")

matplot(AssignmentData[,8],type="l",xaxt="n",xlab = "Input7")
lines(Input7.linear.Model$fitted.values,col="red")

Analysis

At first glance, it looks as though the linear models that fit the best are USGG2YR and USGG3YR. USGG 3YR seems to be the model that fit the best. This is consistent with the previous finding that USGG3YR possesses the highest correlation against the output variable.

Variance of linear models

c(Total.Variance=var(AssignmentData[,8]),Unexplained.Variance=summary(Input7.linear.Model)$sigma^2,Explained.Variance=var(AssignmentData[,8])-summary(Input7.linear.Model)$sigma^2)
##       Total.Variance Unexplained.Variance   Explained.Variance 
##            76.804438             4.967286            71.837152

STEP 3

Fit linear regression models using single output (column 8 Output1) as input and each of the original inputs as outputs.

Coefficients.Input2.Matrix <- Slope.Intercept.Table2<-t(sapply(1:7, function(x) lm(AssignmentData[,x]~AssignmentData[,8])$coefficient))

Coefficients.Input2.Matrix
##      (Intercept) AssignmentData[, 8]
## [1,]    4.675134           0.3839609
## [2,]    4.844370           0.3901870
## [3,]    5.438888           0.4151851
## [4,]    5.644458           0.4063541
## [5,]    6.009421           0.3860610
## [6,]    6.481316           0.3477544
## [7,]    6.869355           0.3047124

STEP 4

Estimate logistic regression using all inputs and the data on FED tightening and easing cycles.

Prepare the easing-tightening data.

Make the easing column equal to 0 during the easing periods and NA otherwise. Make the tightening column equal to 1 during the tightening periods and NA otherwise.

AssignmentDataLogistic <- data.matrix(AssignmentData,rownames.force="automatic")
EasingPeriods <- AssignmentDataLogistic[,9]
EasingPeriods[AssignmentDataLogistic[,9]==1] <- 0
TighteningPeriods <- AssignmentDataLogistic[,10]
cbind(EasingPeriods,TighteningPeriods)[c(550:560,900:910,970:980),]
##            EasingPeriods TighteningPeriods
## 3/29/1983             NA                NA
## 3/30/1983             NA                NA
## 3/31/1983             NA                NA
## 4/4/1983              NA                 1
## 4/5/1983              NA                 1
## 4/6/1983              NA                 1
## 4/7/1983              NA                 1
## 4/8/1983              NA                 1
## 4/11/1983             NA                 1
## 4/12/1983             NA                 1
## 4/13/1983             NA                 1
## 8/27/1984             NA                 1
## 8/28/1984             NA                 1
## 8/29/1984             NA                 1
## 8/30/1984             NA                 1
## 8/31/1984             NA                 1
## 9/4/1984              NA                NA
## 9/5/1984              NA                NA
## 9/6/1984              NA                NA
## 9/7/1984              NA                NA
## 9/10/1984             NA                NA
## 9/11/1984             NA                NA
## 12/10/1984             0                NA
## 12/11/1984             0                NA
## 12/12/1984             0                NA
## 12/13/1984             0                NA
## 12/14/1984             0                NA
## 12/17/1984             0                NA
## 12/18/1984             0                NA
## 12/19/1984             0                NA
## 12/20/1984             0                NA
## 12/21/1984             0                NA
## 12/24/1984             0                NA

Remove the periods of neither easing nor tightening.

Plot the data and the binary output variable representing easing (0) and tightening (1) periods. Column 10 now has the binary output (0 or 1).

All.NAs <- is.na(EasingPeriods)&is.na(TighteningPeriods)
AssignmentDataLogistic.EasingTighteningOnly <- AssignmentDataLogistic
AssignmentDataLogistic.EasingTighteningOnly[,9] <- EasingPeriods
AssignmentDataLogistic.EasingTighteningOnly <- AssignmentDataLogistic.EasingTighteningOnly[!All.NAs,]
AssignmentDataLogistic.EasingTighteningOnly[is.na(AssignmentDataLogistic.EasingTighteningOnly[,10]),10]<-0

matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Data and Binary Fed Mode")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red")

Analysis

The visualization above shows that there were three tightening periods during the 30 years when the data was collected. Visually, it does look like the output variable generally increases during these tightening periods. Tightening periods are generally a time where policies are put in place in order to curb an economy that is seen as growing too quickly. Meanwhile, easing periods are the opposite, it is a time where interest rates are lowered in order to stimulate the economy.

Estimate logistic regression with 3M yields as predictors for easing/tightening output.

LogisticModel.TighteningEasing_3M <- glm(AssignmentDataLogistic.EasingTighteningOnly[,10]~
                                         AssignmentDataLogistic.EasingTighteningOnly[,1],family=binomial(link=logit))
summary(LogisticModel.TighteningEasing_3M)$coefficients
##                                                    Estimate Std. Error
## (Intercept)                                      -2.1525619 0.17328095
## AssignmentDataLogistic.EasingTighteningOnly[, 1]  0.1863761 0.02143723
##                                                     z value     Pr(>|z|)
## (Intercept)                                      -12.422380 1.975941e-35
## AssignmentDataLogistic.EasingTighteningOnly[, 1]   8.694041 3.497715e-18
matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Data and Fitted Values")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red")
lines(LogisticModel.TighteningEasing_3M$fitted.values*20,col="green")

Now use all inputs as predictors for logistic regression.

LogisticModel.TighteningEasing_All <- glm(AssignmentDataLogistic.EasingTighteningOnly[,10]~
                                            AssignmentDataLogistic.EasingTighteningOnly[,1:7],family=binomial(link=logit))

summary(LogisticModel.TighteningEasing_All)
## 
## Call:
## glm(formula = AssignmentDataLogistic.EasingTighteningOnly[, 10] ~ 
##     AssignmentDataLogistic.EasingTighteningOnly[, 1:7], family = binomial(link = logit))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2113  -0.8595  -0.5935   1.1306   2.5530  
## 
## Coefficients:
##                                                            Estimate
## (Intercept)                                                 -4.7552
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3M    -3.3456
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG6M     4.1559
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG2YR    3.9460
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3YR   -3.4642
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG5YR   -3.2115
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG10YR  -0.9705
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG30YR   3.3254
##                                                            Std. Error
## (Intercept)                                                    0.4312
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3M       0.2666
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG6M       0.3748
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG2YR      0.7554
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3YR      0.9340
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG5YR      0.7795
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG10YR     0.9764
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG30YR     0.6138
##                                                            z value
## (Intercept)                                                -11.029
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3M   -12.548
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG6M    11.089
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG2YR    5.224
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3YR   -3.709
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG5YR   -4.120
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG10YR  -0.994
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG30YR   5.418
##                                                            Pr(>|z|)    
## (Intercept)                                                 < 2e-16 ***
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3M    < 2e-16 ***
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG6M    < 2e-16 ***
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG2YR  1.75e-07 ***
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3YR  0.000208 ***
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG5YR  3.79e-05 ***
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG10YR 0.320214    
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG30YR 6.04e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2983.5  on 2357  degrees of freedom
## Residual deviance: 2629.6  on 2350  degrees of freedom
## AIC: 2645.6
## 
## Number of Fisher Scoring iterations: 4
summary(LogisticModel.TighteningEasing_All)$coefficients[,c(1,4)]
##                                                              Estimate
## (Intercept)                                                -4.7551928
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3M   -3.3456116
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG6M    4.1558535
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG2YR   3.9460296
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3YR  -3.4642455
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG5YR  -3.2115319
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG10YR -0.9705444
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG30YR  3.3253517
##                                                                Pr(>|z|)
## (Intercept)                                                2.784283e-28
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3M   4.073045e-36
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG6M   1.422964e-28
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG2YR  1.751687e-07
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG3YR  2.080617e-04
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG5YR  3.786229e-05
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG10YR 3.202140e-01
## AssignmentDataLogistic.EasingTighteningOnly[, 1:7]USGG30YR 6.036041e-08
matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Results of Logistic Regression")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red")
lines(LogisticModel.TighteningEasing_All$fitted.values*20,col="green")

Analysis

Interpret the coefficients of the model and the fitted values.

We are able to see the coefficients of the logistic regression by using the summary function. The first coefficient is the intercept. After estimating logistic regression using all input variables as predictors, the intercept (beta-0) estimate is -4.7552. This means that the logodds of a day being in a tightening period with the input variables (yield to maturity) of 0 is -4.7552. In addition, the negative value means that the probability of a day being in an easing period is higher than the probability of a day being in a tightening period. This is consistent with the tightening/easing visualization shown previously, which shows more days in the easing period than days being in the tightening period. Meanwhile, the next set of coefficients show a range of values signifying the expected change in logodds for a one-unit increase in the input variables (beta-1). A positive value would indicate that the input variable is associated with an increase in odds for every one-unit increase. Input variables 2 (USGG6M), 3 (USGG2YR) and 7 (USGG30YR) have positive coefficients, which means that there is a higher likelihood that a day is in a tightening period when these three variables are higher. Among those three inputs variables, input 2 (USGG6M) was the variable with the highest expected change in logodds for every one-unit increase with a coefficient of 4.16. In addition, the summary statistics for the coefficients tell us that all the input variables are statistically ignificant except for input 6 (USGG19YR).

Looking more closely at the fitted values.

Plotting the fitted values against the Output Variable.

plot(AssignmentDataLogistic.EasingTighteningOnly[,8], LogisticModel.TighteningEasing_All$fitted.values)

Based on this plot, the higher probability of a day being in a tightening period occurs more often when the output variable is higher (or above the value of 10). There seems to be a cluster of fitted values between the range 0 and 0.6 for Output Values below 10. Overall, it seems like it is more likely for a day to be in a tightening period when the Output Variable is higher than usual.

Plotting the fitted values against tightening (1) or easing periods (0).

plot(AssignmentDataLogistic.EasingTighteningOnly[,10], LogisticModel.TighteningEasing_All$fitted.values)

This plot is a little more challenging to analyze due to the high amount of samples in both sides. But looking closely, the minimum and maximum values for the tightening period are higher than in easing periods. In addition, fitted values tend to cluster (darker regions) in the range between 0.0 to 0.6 for days in an easing period, while fitted values tend to cluster in a higher range between 0.2 to 0.75 for days in a tightening period.

Histogram of fitted values against the count of days in a tightening or easing period.

y <- LogisticModel.TighteningEasing_All$fitted.values
period <- ifelse(AssignmentDataLogistic.EasingTighteningOnly[,10]==1, "Tightening", "Easing")
fitted.values.logistic <- data.frame(period,y)

library(ggplot2)
ggplot(data = fitted.values.logistic, aes(x=y, fill=period)) +
  geom_histogram()+stat_bin(bins = 20)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The histogram above shows how the fitted values are distributed for easing periods and tightening periods. The population of days that were in an easing period has a positive skew. The population of days in a tightening period is showing more characteristics of a normal distribution with a mean at about 0.375. The mean for tightening periods occurs at around 0.375, while it occurs at around .300 for easing periods. If you look more closely at the right-side tails of the distribution, there is a higher proportion of tightening days than easing days. The fact that there seems to be a higher proportion of days in tightening periods than easing periods on the right side of the histogram (range of 0.5 and above) tells me that the model is doing a decent job of not classifying days that are in easing periods as days in a tightening period (false positives). Overall, I would like to see the peak for tightening period to shift towards the right with more refinement.

Calculate and plot log-odds and probabilities. Compare probabilities with fitted values.

Log.Odds <- predict(LogisticModel.TighteningEasing_All)
plot(Log.Odds,type="l")

Probabilities<-1/(exp(-Log.Odds)+1)
plot(LogisticModel.TighteningEasing_All$fitted.values,type="l",ylab="Fitted Values & Log-Odds")
lines(Probabilities,col="red")

Analysis

When comparing the two plots between log-odds and probabilities, it is easy to see that the fitted values coming from the logistic regression is merely just a transformation of the log-odds of an event successfully occuring (i.e. its probability). In this instance, a successful event is defined as a day being in a tightening period.

STEP 5

Compare linear regression models with different combinations of predictors. Select the best combination.

Below we show only two of possible combinations: full model containing all 7 predictors and Null model containing only intercept, but none of the 7 predictors. Estimate other possible combinations.

AssignmentDataRegressionComparison<-data.matrix(AssignmentData[,-c(9,10)],rownames.force="automatic")
AssignmentDataRegressionComparison<-AssignmentData[,-c(9,10)]

Estimate the full model by using all 7 predictors.

RegressionModelComparison.Full <- lm(Output1~., data = AssignmentDataRegressionComparison)

Look at coefficients, R2, adjusted R2, degrees of freedom.

  1. Coefficients:
summary(RegressionModelComparison.Full)$coefficients[,c(1,2,3,4)]
##                Estimate   Std. Error       t value Pr(>|t|)
## (Intercept) -14.9041591 1.056850e-10 -141024294891        0
## USGG3M        0.3839609 9.860401e-11    3893968285        0
## USGG6M        0.3901870 1.500111e-10    2601053702        0
## USGG2YR       0.4151851 2.569451e-10    1615851177        0
## USGG3YR       0.4063541 3.299038e-10    1231735395        0
## USGG5YR       0.3860610 2.618339e-10    1474449865        0
## USGG10YR      0.3477544 2.800269e-10    1241860763        0
## USGG30YR      0.3047124 1.566487e-10    1945195584        0
  1. R-squared
summary(RegressionModelComparison.Full)$r.squared
## [1] 1

Adjusted R-squared

summary(RegressionModelComparison.Full)$adj.r.squared
## [1] 1
  1. Degrees of freedom:
summary(RegressionModelComparison.Full)$df
## [1]    8 8292    8

Analysis

Intepret the fitted model. How good is the fit? How significant are the parameters?

After fitting the linear model with all inputs, we can look at the summary of the model to see how well it fit. We will first look into the model’s coefficients and its statistical significance. The multiple regression model estimates that the intercept (beta-0) is -14.9. Meanwhile, the estimated coefficients of each variable are between the range of .304 (USGG30YR) to .415 (USGG2YR). According to the summary, all variables are statistically significant. One method to assess the model fit is to look at its r-squared/adjusted r-squared. The value for r-squared/adjusted r-squared is between 0 and 1. Generally, a higher r-squared/adjusted r-suared indicates that a model fits well. The r-squared/adjusted r-squared for this particular model shows a value of 1 suggesting that this model fits extremely well. In other words, fitting all of the inputs accounts for 100% of the variation in the output variable can be explained by our model. One possibility for this result may be due to the multicollinearity present among the independent variables (see below).

Checking for multicollinearity

Plot of input/output variables against each other.

plot(AssignmentDataRegressionComparison[0:8])

Correlation Matrix between input/output variables against each other.

cor(AssignmentDataRegressionComparison[0:7])
##             USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## USGG3M   1.0000000 0.9979328 0.9843135 0.9768273 0.9603153 0.9348717
## USGG6M   0.9979328 1.0000000 0.9910013 0.9845272 0.9691905 0.9444964
## USGG2YR  0.9843135 0.9910013 1.0000000 0.9987920 0.9916366 0.9753836
## USGG3YR  0.9768273 0.9845272 0.9987920 1.0000000 0.9963859 0.9836936
## USGG5YR  0.9603153 0.9691905 0.9916366 0.9963859 1.0000000 0.9948318
## USGG10YR 0.9348717 0.9444964 0.9753836 0.9836936 0.9948318 1.0000000
## USGG30YR 0.9069437 0.9167172 0.9539191 0.9648236 0.9819750 0.9956347
##           USGG30YR
## USGG3M   0.9069437
## USGG6M   0.9167172
## USGG2YR  0.9539191
## USGG3YR  0.9648236
## USGG5YR  0.9819750
## USGG10YR 0.9956347
## USGG30YR 1.0000000

Analysis

Checking for multicollinearity in a dataset means to check for how correlated the independent variables are to each other. After looking at the pairs plot and the correlation matrix above, it is clear that there is a high amount of correlation occurring for all input variables against each other.

Estimate the Null model by including only intercept.

RegressionModelComparison.Null <- lm(Output1~1, data = AssignmentDataRegressionComparison)
  1. Coefficients:
summary(RegressionModelComparison.Null)$coefficients[,c(1,2,3,4)]
##     Estimate   Std. Error      t value     Pr(>|t|) 
## 1.420082e-11 9.619536e-02 1.476248e-10 1.000000e+00
  1. R-squared
summary(RegressionModelComparison.Null)$r.squared
## [1] 0

Adjusted R-squared

summary(RegressionModelComparison.Null)$adj.r.squared
## [1] 0
  1. Degrees of freedom:
summary(RegressionModelComparison.Null)$df
## [1]    1 8299    1

Analysis

Why summary(RegressionModelComparison.Null) does not show R2?

After fitting the model with only the output variable as the input, we see that this model shows an r-squared/adjusted r-squared value of 0. Setting the linear model to fit with only the ouput as the input variable produces a model where its predicted values’ total squared residuals (SSE) equal to the total sum of squares (SST). This is due to the fact that the model produced a line that is the average of all the Output field. Therefore, when the r-squared is calculated (1 - (SSE/SST)) for the model, it produces a value of 0.

Compare models pairwise using anova()

anova(RegressionModelComparison.Full,RegressionModelComparison.Null)
## Analysis of Variance Table
## 
## Model 1: Output1 ~ USGG3M + USGG6M + USGG2YR + USGG3YR + USGG5YR + USGG10YR + 
##     USGG30YR
## Model 2: Output1 ~ 1
##   Res.Df    RSS Df Sum of Sq        F    Pr(>F)    
## 1   8292      0                                    
## 2   8299 637400 -7   -637400 3.73e+22 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis

Interpret the results of anova().

Anova (analysis of variance) was used to compare the full model and the null model. The summary of the comparison tells me that the change in sum of squares error from model 1 to model 2 is statistically significant.

Repeat the analysis for different combinations of input variables and select the one you think is the best.

Explain your selection.

To start of the analysis, the drop method will be used to identify which independent variable(s) to remove from the model.

drop.method <- lm(Output1~.,data=AssignmentDataRegressionComparison)
drop1(drop.method,scope= c("USGG3M","USGG6M","USGG2YR","USGG3YR","USGG5YR","USGG10YR","USGG30YR"))
## Warning: attempting model selection on an essentially perfect fit is
## nonsense
## Single term deletions
## 
## Model:
## Output1 ~ USGG3M + USGG6M + USGG2YR + USGG3YR + USGG5YR + USGG10YR + 
##     USGG30YR
##          Df Sum of Sq    RSS     AIC
## <none>                 0.000 -336591
## USGG3M    1    37.016 37.016  -44911
## USGG6M    1    16.516 16.516  -51609
## USGG2YR   1     6.374  6.374  -59512
## USGG3YR   1     3.704  3.704  -64018
## USGG5YR   1     5.307  5.307  -61032
## USGG10YR  1     3.765  3.765  -63882
## USGG30YR  1     9.237  9.237  -56433

Analysis

Since the model is “essentially a perfect fit,” there is no need to drop any variables suggesting that the full model is best.

Now, the add method will be used to see which are the most significant variables.

add.method <- lm(Output1~1,data=AssignmentDataRegressionComparison)
add1(add.method,scope=c("USGG3M","USGG6M","USGG2YR","USGG3YR","USGG5YR","USGG10YR","USGG30YR"))
## Single term additions
## 
## Model:
## Output1 ~ 1
##          Df Sum of Sq    RSS    AIC
## <none>                637400  36033
## USGG3M    1    613692  23708   8715
## USGG6M    1    621075  16325   5618
## USGG2YR   1    635252   2148 -11217
## USGG3YR   1    636075   1325 -15226
## USGG5YR   1    632104   5296  -3725
## USGG10YR  1    617761  19639   7153
## USGG30YR  1    596181  41219  13306

Check the r-squared value of a linear model with only the variable that had the least RSS (USGG3YR).

comb1 <- lm(Output1 ~ USGG3YR, AssignmentDataRegressionComparison)
summary(comb1)$r.squared
## [1] 0.9979215

Conduct add method starting with USGG3M.

add1(comb1,scope=c("USGG3M","USGG6M","USGG2YR","USGG5YR","USGG10YR","USGG30YR"))
## Single term additions
## 
## Model:
## Output1 ~ USGG3YR
##          Df Sum of Sq     RSS    AIC
## <none>                1324.83 -15226
## USGG3M    1    407.98  916.85 -18279
## USGG6M    1    270.17 1054.66 -17117
## USGG2YR   1     82.93 1241.91 -15761
## USGG5YR   1     20.94 1303.89 -15356
## USGG10YR  1     64.05 1260.78 -15636
## USGG30YR  1    100.79 1224.04 -15881

Check the r-squared value of a linear model with both USGG3YR and USGG3M.

comb2 <- lm(Output1 ~ USGG3YR + USGG3M, AssignmentDataRegressionComparison)
summary(comb2)$r.squared
## [1] 0.9985616

Analysis

After the conducting the add method, USGG3YR is the variable selected and had a least r-squared value at 0.9979. USGG3YR was then combined with USGG3M, which produced a slight increase of the r-squared value at .9986. There was not any significant value added to adding any more predictors to USGG3YR, so the best model will be the one using only this input.

STEP 6

Perform rolling window analysis of the yields data. Use package zoo for rolling window analysis.

Set the window width and window shift parameters for rolling window. Run rolling mean values usingrollapply().

Window.width <- 20; Window.shift <- 5
library(zoo)
## Warning: package 'zoo' was built under R version 3.4.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

Calculate rolling mean values for each variable.

all.means<-rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=TRUE, mean)
head(all.means,10)
##        USGG3M  USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR  Output1
##  [1,] 15.0405 14.0855 13.2795 12.9360 12.7825  12.5780  12.1515 20.14842
##  [2,] 15.1865 14.1440 13.4855 13.1085 12.9310  12.7370  12.3370 20.55208
##  [3,] 15.2480 14.2755 13.7395 13.3390 13.1500  12.9480  12.5500 21.04895
##  [4,] 14.9345 14.0780 13.7750 13.4765 13.2385  13.0515  12.6610 21.02611
##  [5,] 14.7545 14.0585 13.9625 13.6890 13.4600  13.2295  12.8335 21.31356
##  [6,] 14.6025 14.0115 14.0380 13.7790 13.5705  13.3050  12.8890 21.39061
##  [7,] 14.0820 13.5195 13.8685 13.6710 13.4815  13.1880  12.7660 20.77200
##  [8,] 13.6255 13.0635 13.6790 13.5735 13.4270  13.1260  12.6950 20.23626
##  [9,] 13.2490 12.6810 13.5080 13.4575 13.3680  13.0770  12.6470 19.76988
## [10,] 12.9545 12.4225 13.4140 13.4240 13.3850  13.1115  12.6755 19.53054

Create points at which rolling means are calculated

Count <- 1:length(AssignmentDataRegressionComparison[,1])
Rolling.window.matrix <- rollapply(Count,width=Window.width,by=Window.shift,by.column=FALSE,
          FUN=function(z) z)
Rolling.window.matrix[1:10,]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    1    2    3    4    5    6    7    8    9    10    11    12    13
##  [2,]    6    7    8    9   10   11   12   13   14    15    16    17    18
##  [3,]   11   12   13   14   15   16   17   18   19    20    21    22    23
##  [4,]   16   17   18   19   20   21   22   23   24    25    26    27    28
##  [5,]   21   22   23   24   25   26   27   28   29    30    31    32    33
##  [6,]   26   27   28   29   30   31   32   33   34    35    36    37    38
##  [7,]   31   32   33   34   35   36   37   38   39    40    41    42    43
##  [8,]   36   37   38   39   40   41   42   43   44    45    46    47    48
##  [9,]   41   42   43   44   45   46   47   48   49    50    51    52    53
## [10,]   46   47   48   49   50   51   52   53   54    55    56    57    58
##       [,14] [,15] [,16] [,17] [,18] [,19] [,20]
##  [1,]    14    15    16    17    18    19    20
##  [2,]    19    20    21    22    23    24    25
##  [3,]    24    25    26    27    28    29    30
##  [4,]    29    30    31    32    33    34    35
##  [5,]    34    35    36    37    38    39    40
##  [6,]    39    40    41    42    43    44    45
##  [7,]    44    45    46    47    48    49    50
##  [8,]    49    50    51    52    53    54    55
##  [9,]    54    55    56    57    58    59    60
## [10,]    59    60    61    62    63    64    65

Take middle of each window

Points.of.calculation <- Rolling.window.matrix[,10]
Points.of.calculation[1:10]
##  [1] 10 15 20 25 30 35 40 45 50 55
length(Points.of.calculation)
## [1] 1657

Insert means into the total length vector to plot the rolling mean with the original data

Means.forPlot <- rep(NA,length(AssignmentDataRegressionComparison[,1]))
Means.forPlot[Points.of.calculation] <- all.means[,1]
Means.forPlot[1:50]
##  [1]      NA      NA      NA      NA      NA      NA      NA      NA
##  [9]      NA 15.0405      NA      NA      NA      NA 15.1865      NA
## [17]      NA      NA      NA 15.2480      NA      NA      NA      NA
## [25] 14.9345      NA      NA      NA      NA 14.7545      NA      NA
## [33]      NA      NA 14.6025      NA      NA      NA      NA 14.0820
## [41]      NA      NA      NA      NA 13.6255      NA      NA      NA
## [49]      NA 13.2490

Assemble the matrix to plot the rolling means

cbind(AssignmentDataRegressionComparison[,1],Means.forPlot)[1:50,]
##             Means.forPlot
##  [1,] 13.52            NA
##  [2,] 13.58            NA
##  [3,] 14.50            NA
##  [4,] 14.76            NA
##  [5,] 15.20            NA
##  [6,] 15.22            NA
##  [7,] 15.24            NA
##  [8,] 15.08            NA
##  [9,] 15.25            NA
## [10,] 15.15       15.0405
## [11,] 15.79            NA
## [12,] 15.32            NA
## [13,] 15.71            NA
## [14,] 15.72            NA
## [15,] 15.70       15.1865
## [16,] 15.35            NA
## [17,] 15.20            NA
## [18,] 15.06            NA
## [19,] 14.87            NA
## [20,] 14.59       15.2480
## [21,] 14.90            NA
## [22,] 14.85            NA
## [23,] 14.67            NA
## [24,] 14.74            NA
## [25,] 15.32       14.9345
## [26,] 15.52            NA
## [27,] 15.46            NA
## [28,] 15.54            NA
## [29,] 15.51            NA
## [30,] 15.14       14.7545
## [31,] 15.02            NA
## [32,] 14.48            NA
## [33,] 14.09            NA
## [34,] 14.23            NA
## [35,] 14.15       14.6025
## [36,] 14.20            NA
## [37,] 14.14            NA
## [38,] 14.22            NA
## [39,] 14.52            NA
## [40,] 14.39       14.0820
## [41,] 14.49            NA
## [42,] 14.51            NA
## [43,] 14.29            NA
## [44,] 14.16            NA
## [45,] 13.99       13.6255
## [46,] 13.92            NA
## [47,] 13.66            NA
## [48,] 13.21            NA
## [49,] 13.02            NA
## [50,] 12.95       13.2490
plot(Means.forPlot,col="red")
lines(AssignmentDataRegressionComparison[,1])

Run rolling daily difference of each variable

all.dailydiff <- rollapply(AssignmentDataRegressionComparison,width=2,by=1,by.column=TRUE, ascending = FALSE, diff)
head(all.dailydiff,10)
##       USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR     Output1
##  [1,]   0.06   0.07    0.14    0.03   -0.08    -0.04     0.00  0.07587222
##  [2,]   0.92   0.74    0.50    0.47    0.40     0.27     0.22  1.35591615
##  [3,]   0.26   0.10    0.17    0.17    0.07    -0.03     0.02  0.30119608
##  [4,]   0.44   0.30    0.44    0.33    0.20     0.22     0.22  0.82353207
##  [5,]   0.02  -0.07   -0.36   -0.34   -0.17    -0.12    -0.05 -0.42985741
##  [6,]   0.02  -0.13    0.13    0.03   -0.03     0.08     0.00  0.03935812
##  [7,]  -0.16  -0.20   -0.35   -0.22   -0.07     0.00    -0.01 -0.40425521
##  [8,]   0.17   0.19    0.30    0.27    0.16     0.09     0.18  0.52159590
##  [9,]  -0.10  -0.11   -0.17   -0.17   -0.11    -0.09    -0.12 -0.33130842
## [10,]   0.64   0.75    0.54    0.07    0.26     0.11     0.12  0.96621424
rolling.sd <- rollapply(all.dailydiff,width=Window.width,by=Window.shift,by.column=TRUE, sd)
head(rolling.sd)
##         USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR  USGG30YR
## [1,] 0.3433258 0.3262462 0.2748258 0.2030161 0.1713192 0.1299585 0.1202147
## [2,] 0.2933383 0.2907504 0.2261811 0.1499219 0.1450082 0.1146895 0.1192201
## [3,] 0.2613180 0.2437530 0.2006201 0.1632596 0.1654110 0.1459308 0.1351909
## [4,] 0.2551754 0.2469663 0.1989446 0.1692794 0.1717219 0.1551052 0.1422183
## [5,] 0.2480551 0.2481595 0.2102004 0.1786057 0.1744767 0.1643960 0.1516540
## [6,] 0.1963884 0.2363672 0.2095082 0.1809180 0.1822917 0.1664956 0.1537351
##        Output1
## [1,] 0.5639875
## [2,] 0.4707427
## [3,] 0.4681168
## [4,] 0.4786189
## [5,] 0.4888569
## [6,] 0.4788897
rolling.dates <- rollapply(AssignmentDataRegressionComparison[-1,],width=Window.width,by=Window.shift,
                         by.column=FALSE,FUN=function(z) rownames(z))
rownames(rolling.sd) <- rolling.dates[,10]

matplot(rolling.sd[,c(1,5,7,8)],xaxt="n",type="l",col=c("black","red","blue","green"))
axis(side=1,at=1:1656,rownames(rolling.sd))

Show periods of high volatility.

high.volatility.periods <- rownames(rolling.sd)[rolling.sd[,8]>.5]
high.volatility.periods
##  [1] "1/19/1981"  "3/25/1981"  "4/1/1981"   "4/8/1981"   "4/23/1981" 
##  [6] "4/30/1981"  "5/7/1981"   "5/14/1981"  "5/21/1981"  "5/29/1981" 
## [11] "6/5/1981"   "6/12/1981"  "6/19/1981"  "6/26/1981"  "7/6/1981"  
## [16] "7/13/1981"  "7/20/1981"  "7/27/1981"  "10/28/1981" "11/5/1981" 
## [21] "11/13/1981" "11/20/1981" "11/30/1981" "12/7/1981"  "12/14/1981"
## [26] "12/29/1981" "1/14/1982"  "1/21/1982"  "1/28/1982"  "2/4/1982"  
## [31] "2/11/1982"  "7/29/1982"  "8/5/1982"   "8/12/1982"  "8/19/1982" 
## [36] "8/26/1982"  "9/24/1982"  "10/1/1982"  "10/8/1982"  "10/18/1982"
## [41] "10/13/1987" "10/20/1987" "10/27/1987" "11/19/2007" "11/26/2007"
## [46] "11/12/2008" "11/19/2008"

Analysis

Fit linear model to rolling window data using 3 months, 5 years and 30 years variables as predictors.

Coefficients <- rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=FALSE,
         FUN=function(z) coef(lm(Output1~USGG3M+USGG5YR+USGG30YR,data=as.data.frame(z))))

rolling.dates <- rollapply(AssignmentDataRegressionComparison[,1:8],width=Window.width,by=Window.shift,by.column=FALSE,
                         FUN=function(z) rownames(z))

rownames(Coefficients) <- rolling.dates[,10]
Coefficients[1:10,]
##           (Intercept)    USGG3M  USGG5YR  USGG30YR
## 1/16/1981   -15.70877 0.6723609 1.797680 0.2276011
## 1/23/1981   -15.96684 0.6948992 1.480514 0.5529139
## 1/30/1981   -16.77273 0.7078197 1.434388 0.6507280
## 2/6/1981    -16.90734 0.7279033 1.470083 0.6003377
## 2/17/1981   -17.46896 0.7343406 1.361289 0.7499705
## 2/24/1981   -17.04722 0.7357663 1.295641 0.7844908
## 3/3/1981    -17.67901 0.8544681 1.396653 0.5945022
## 3/10/1981   -17.72402 0.9162385 1.654274 0.2571200
## 3/17/1981   -17.00260 0.9265767 1.647852 0.1951273
## 3/24/1981   -16.84302 0.9102780 1.477727 0.3788401

Look at pairwise X-Y plots of regression coefficients for the 3M, 5Yr and 30Yr yields as inputs.

pairs(Coefficients)

Analysis

Interpret the pairs plot.

The pairs plot above tells a completely different story from previous analysis. In previous analysis, we looked at the data as a whole without taking into account periods of time. The pairs plot above of the coefficients of a linear model fitting three predictors (USGG3M, USGG5YR, and USGG30YR) into the rolling window data show us multicollinearity between the input variables. USGG3M and USGG5YR do not show any meaningful between each other or with USGG30YR. However, the pairs plot does show a strong negative relationship between USGG5YR and USGG30YR.

Plot the coefficients. Show periods.

matplot(Coefficients[,-1],xaxt="n",type="l",col=c("black","red","green"))
axis(side=1,at=1:1657,rownames(Coefficients))

high.slopespread.periods <- rownames(Coefficients)[Coefficients[,3]-Coefficients[,4]>3]
jump.slopes <- rownames(Coefficients)[Coefficients[,3]>3]
high.slopespread.periods
##  [1] "4/26/1982"  "12/15/1982" "9/16/1985"  "5/12/1987"  "5/19/1987" 
##  [6] "5/27/1987"  "9/25/1987"  "3/15/1988"  "9/27/1988"  "10/5/1988" 
## [11] "3/10/1989"  "2/5/1992"   "8/3/1994"   "12/8/1994"  "6/14/1996" 
## [16] "5/9/1997"   "5/16/1997"  "5/23/1997"  "5/30/1997"  "12/26/2000"
## [21] "1/2/2001"   "7/25/2001"  "8/1/2001"   "11/13/2003" "8/12/2004" 
## [26] "12/16/2004"

Analysis

The dates shown above are when the difference between the predicted USGG30YR coefficient and predicted USGG5YR coefficient is more than 3. There are 26 such days identified with the most recent date coming in 12/16/2004.

jump.slopes
## [1] "12/16/2004"

Analysis

The date shown above is identified as a ‘jump slope’. A jump slope occurs when the predicted USGG30YR coefficient is more than 3. There was one such date on 12/16/2004.

Is the picture of coefficients consistent with the picture of pairs? If yes, explain why.

When comparing the picture of coefficients and the picture of pairs, we can see that it is consistent with the observation that the USGG5YR coefficients and the USGG30YR coefficients have a negative relationship. In the picture of coefficients, USGG5YR coefficients is the red line, while USGG30YR coefficients is the green line. It is clearly shown that the red line tends to be at the higher part of the graph, while the green line is in the lower part of the graph. In other words, there is a negative relationship bettween the two lines, which validates the observed relationship from the picture of pairs.

How often the R-squared is not considered high?

# R-squared
r.squared <- rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=FALSE,
         FUN=function(z) summary(lm(Output1~USGG3M+USGG5YR+USGG30YR,data=as.data.frame(z)))$r.squared)
r.squared <- cbind(rolling.dates[,10],r.squared)
r.squared[1:10,]
##                   r.squared          
##  [1,] "1/16/1981" "0.995046300986446"
##  [2,] "1/23/1981" "0.992485868136646"
##  [3,] "1/30/1981" "0.998641209587999"
##  [4,] "2/6/1981"  "0.998849080081881"
##  [5,] "2/17/1981" "0.997958757207598"
##  [6,] "2/24/1981" "0.996489757136839"
##  [7,] "3/3/1981"  "0.99779753570421" 
##  [8,] "3/10/1981" "0.998963395226792"
##  [9,] "3/17/1981" "0.998729445388789"
## [10,] "3/24/1981" "0.997073000898673"
plot(r.squared[,2],xaxt="n",ylim=c(0,1))
axis(side=1,at=1:1657,rownames(Coefficients))

#### Analysis The plot above shows the value of r-squared for the linear model that was just applied on the rolling window data using USGG3M, USGG5YR, and USGG30YR as predictors. Based on this plot, it looks like r-squared values tend to be very high around with values close to 1.There are only a handful of days that would not be considered as “high” r-squared values and could be considered as outliers.

(low.r.squared.periods<-r.squared[r.squared[,2]<.9,1])
## [1] "6/24/1987" "6/27/1991" "4/28/2005" "6/20/2012"

Analysis

What could cause decrease of R-squared?

The dates above are the rare days tat had r-squared vaues of below 0.90. When an r-squared value is lower, it means that the sum of squares error is higher. The model is not able to fully explain the variation that is occuring. This could mean that there might be some external factor that is not being taken into account for that particular day or period of time that contributed to the output variable being more difficult to fit.

Analyze the rolling p-values.

# P-values
Pvalues <- rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=FALSE,
                        FUN=function(z) summary(lm(Output1~USGG3M+USGG5YR+USGG30YR,data=as.data.frame(z)))$coefficients[,4])
rownames(Pvalues) <- rolling.dates[,10]
Pvalues[1:10,]
##            (Intercept)       USGG3M      USGG5YR     USGG30YR
## 1/16/1981 1.193499e-10 3.764585e-10 2.391260e-07 2.538852e-01
## 1/23/1981 3.751077e-12 1.008053e-11 2.447369e-07 5.300949e-03
## 1/30/1981 3.106359e-18 1.406387e-14 4.040035e-09 3.626961e-05
## 2/6/1981  2.591522e-19 3.360104e-19 3.828054e-11 2.221691e-05
## 2/17/1981 1.897239e-16 6.578118e-17 1.461743e-09 1.331767e-04
## 2/24/1981 2.341158e-13 1.000212e-13 9.008221e-07 4.733543e-03
## 3/3/1981  5.435581e-14 1.535503e-11 3.357199e-06 6.010473e-02
## 3/10/1981 6.227624e-16 1.178498e-16 1.679479e-05 3.851840e-01
## 3/17/1981 9.592582e-17 7.065226e-20 1.459692e-05 5.025726e-01
## 3/24/1981 8.248747e-16 6.689840e-16 6.413371e-04 3.052705e-01
matplot(Pvalues,xaxt="n",col=c("black","blue","red","green"),type="o")
axis(side=1,at=1:1657,rownames(Coefficients))

rownames(Pvalues)[Pvalues[,2]>.5]
##  [1] "7/15/1992"  "7/26/1996"  "8/2/1996"   "11/7/2000"  "5/30/2001" 
##  [6] "5/2/2002"   "5/16/2002"  "5/23/2002"  "1/30/2003"  "2/6/2003"  
## [11] "7/24/2003"  "7/31/2003"  "8/7/2003"   "11/20/2003" "12/18/2003"
## [16] "4/28/2005"  "2/10/2006"  "3/9/2007"   "3/16/2007"  "7/21/2009" 
## [21] "10/6/2009"  "10/13/2009" "12/28/2010" "1/11/2011"  "3/1/2011"  
## [26] "11/16/2011" "11/23/2011" "5/23/2012"  "7/11/2012"  "6/6/2013"  
## [31] "1/16/2014"  "1/30/2014"  "3/6/2014"
rownames(Pvalues)[Pvalues[,3]>.5]
## [1] "12/1/1982" "3/16/1987" "4/28/1987" "6/24/1987" "9/3/1987"  "9/11/1987"
## [7] "9/20/1988" "12/3/1999"
rownames(Pvalues)[Pvalues[,4]>.5]
##   [1] "3/17/1981"  "4/22/1981"  "4/29/1981"  "6/4/1981"   "10/13/1981"
##   [6] "11/19/1981" "2/3/1982"   "2/26/1982"  "4/2/1982"   "4/12/1982" 
##  [11] "5/3/1982"   "7/7/1982"   "9/1/1982"   "9/23/1982"  "12/8/1982" 
##  [16] "2/18/1983"  "3/1/1983"   "5/4/1983"   "12/30/1983" "1/9/1984"  
##  [21] "2/6/1984"   "3/14/1984"  "3/21/1984"  "4/11/1984"  "6/22/1984" 
##  [26] "6/29/1984"  "10/31/1984" "11/16/1984" "11/26/1984" "12/17/1984"
##  [31] "3/25/1985"  "5/14/1985"  "5/21/1985"  "8/30/1985"  "9/9/1985"  
##  [36] "9/23/1985"  "10/1/1985"  "10/8/1985"  "10/16/1985" "10/23/1985"
##  [41] "10/30/1985" "11/14/1985" "11/21/1985" "1/22/1986"  "1/29/1986" 
##  [46] "5/5/1987"   "12/16/1987" "1/25/1988"  "2/1/1988"   "2/16/1988" 
##  [51] "3/1/1988"   "3/22/1988"  "5/18/1988"  "6/3/1988"   "6/10/1988" 
##  [56] "6/24/1988"  "7/25/1988"  "8/15/1988"  "12/5/1988"  "2/2/1989"  
##  [61] "3/3/1989"   "4/10/1989"  "5/1/1989"   "6/13/1989"  "8/16/1989" 
##  [66] "9/14/1989"  "9/21/1989"  "10/3/1989"  "10/11/1989" "10/18/1989"
##  [71] "11/1/1989"  "11/30/1989" "12/7/1989"  "1/8/1990"   "1/16/1990" 
##  [76] "6/15/1990"  "7/30/1990"  "8/6/1990"   "10/2/1990"  "10/10/1990"
##  [81] "11/23/1990" "3/1/1991"   "5/13/1991"  "5/20/1991"  "6/13/1991" 
##  [86] "7/5/1991"   "7/19/1991"  "9/16/1991"  "2/12/1992"  "2/20/1992" 
##  [91] "3/12/1992"  "4/16/1992"  "4/24/1992"  "5/1/1992"   "5/8/1992"  
##  [96] "6/2/1992"   "6/9/1992"   "6/16/1992"  "8/19/1992"  "8/26/1992" 
## [101] "10/23/1992" "5/20/1993"  "6/11/1993"  "6/18/1993"  "8/30/1993" 
## [106] "12/15/1993" "12/22/1993" "3/16/1994"  "3/30/1994"  "4/6/1994"  
## [111] "4/20/1994"  "4/27/1994"  "6/29/1994"  "8/17/1994"  "9/21/1994" 
## [116] "9/28/1994"  "12/22/1994" "12/29/1994" "1/5/1995"   "1/12/1995" 
## [121] "1/26/1995"  "2/2/1995"   "2/9/1995"   "4/6/1995"   "4/13/1995" 
## [126] "8/25/1995"  "9/29/1995"  "10/27/1995" "11/3/1995"  "11/10/1995"
## [131] "12/29/1995" "1/5/1996"   "1/12/1996"  "1/19/1996"  "3/29/1996" 
## [136] "5/31/1996"  "6/21/1996"  "7/12/1996"  "7/19/1996"  "7/26/1996" 
## [141] "8/2/1996"   "8/9/1996"   "8/16/1996"  "8/23/1996"  "9/27/1996" 
## [146] "10/4/1996"  "12/6/1996"  "2/28/1997"  "3/7/1997"   "4/18/1997" 
## [151] "4/25/1997"  "5/2/1997"   "6/13/1997"  "6/20/1997"  "6/27/1997" 
## [156] "7/4/1997"   "10/10/1997" "10/17/1997" "12/12/1997" "12/19/1997"
## [161] "12/26/1997" "1/9/1998"   "1/16/1998"  "8/14/1998"  "8/21/1998" 
## [166] "8/28/1998"  "9/18/1998"  "9/25/1998"  "12/4/1998"  "12/11/1998"
## [171] "1/8/1999"   "3/12/1999"  "4/2/1999"   "5/14/1999"  "6/25/1999" 
## [176] "7/9/1999"   "7/30/1999"  "8/20/1999"  "9/10/1999"  "9/24/1999" 
## [181] "10/15/1999" "12/31/1999" "3/3/2000"   "3/31/2000"  "4/7/2000"  
## [186] "4/14/2000"  "4/21/2000"  "7/25/2000"  "11/21/2000" "11/28/2000"
## [191] "3/7/2001"   "5/30/2001"  "7/11/2001"  "10/11/2001" "12/13/2001"
## [196] "1/24/2002"  "1/31/2002"  "8/29/2002"  "9/26/2002"  "12/26/2002"
## [201] "1/16/2003"  "1/23/2003"  "3/6/2003"   "3/13/2003"  "3/20/2003" 
## [206] "3/27/2003"  "5/1/2003"   "6/19/2003"  "6/26/2003"  "7/3/2003"  
## [211] "9/18/2003"  "9/25/2003"  "10/16/2003" "10/23/2003" "10/30/2003"
## [216] "11/20/2003" "1/1/2004"   "2/5/2004"   "2/26/2004"  "3/4/2004"  
## [221] "4/15/2004"  "4/22/2004"  "5/13/2004"  "5/27/2004"  "6/3/2004"  
## [226] "6/17/2004"  "6/24/2004"  "7/8/2004"   "10/14/2004" "10/21/2004"
## [231] "10/28/2004" "11/18/2004" "12/23/2004" "3/17/2005"  "3/24/2005" 
## [236] "3/31/2005"  "4/7/2005"   "7/21/2005"  "8/11/2005"  "9/22/2005" 
## [241] "10/6/2005"  "11/3/2005"  "12/8/2005"  "12/15/2005" "1/20/2006" 
## [246] "5/12/2006"  "5/19/2006"  "5/26/2006"  "6/2/2006"   "6/9/2006"  
## [251] "6/16/2006"  "7/7/2006"   "7/21/2006"  "10/6/2006"  "11/3/2006" 
## [256] "1/12/2007"  "2/23/2007"  "3/16/2007"  "4/27/2007"  "5/25/2007" 
## [261] "9/7/2007"   "9/14/2007"  "9/21/2007"  "9/28/2007"  "11/16/2007"
## [266] "5/5/2009"   "6/1/2010"   "6/15/2010"  "1/25/2011"  "2/1/2011"  
## [271] "6/12/2014"

Analysis

Interpret the plot.

The plot of rolling p-values gives allow us to compare the significance of the input variables. Based on the plot, USGG30YR (green line) appears to be the least significant among the the other variables with 271 days that had p-values of greater than 0.50. USGG3M (blue line) is next least significant with 33 days with p-values greater than 0.50. The most significant predictor is USGG5YR with only 8 days that had p-values greater than 0.5. Visually, you can see that the red lines tend to be at the base of the graph, which validates that USGG5YR is the most significant predictor among the three.

STEP 7

Perform PCA with the inputs (columns 1-7).

AssignmentData.Output <- AssignmentData$Output1
AssignmentData <- data.matrix(AssignmentData[,1:7],rownames.force="automatic")
dim(AssignmentData)
## [1] 8300    7
head(AssignmentData)
##           USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR
## 1/5/1981   13.52  13.09  12.289   12.28  12.294   12.152   11.672
## 1/6/1981   13.58  13.16  12.429   12.31  12.214   12.112   11.672
## 1/7/1981   14.50  13.90  12.929   12.78  12.614   12.382   11.892
## 1/8/1981   14.76  14.00  13.099   12.95  12.684   12.352   11.912
## 1/9/1981   15.20  14.30  13.539   13.28  12.884   12.572   12.132
## 1/12/1981  15.22  14.23  13.179   12.94  12.714   12.452   12.082

Explore the dimensionality of the set of 3M, 2Y and 5Y yields.

# Select 3 variables. Explore dimensionality and correlation 
AssignmentData.3M_2Y_5Y<-AssignmentData[,c(1,3,5)]
pairs(AssignmentData.3M_2Y_5Y)

Observe the 3D plot of the set. Use library rgl:

library("rgl")
## Warning: package 'rgl' was built under R version 3.4.4
rgl.points(AssignmentData.3M_2Y_5Y)
Covariance.Matrix <- cov(AssignmentData)
#Means
a <- mean(AssignmentData[,1])
b <- mean(AssignmentData[,2])
c <- mean(AssignmentData[,3])
d <- mean(AssignmentData[,4])
e <- mean(AssignmentData[,5])
f <- mean(AssignmentData[,6])
g <- mean(AssignmentData[,7])

k <- ncol(AssignmentData)
n <- nrow(AssignmentData)

Input.means <- c(a, b, c, d, e, f, g)
Input.means.matrix <- matrix(data = Input.means, nrow = n, byrow = FALSE)
## Warning in matrix(data = Input.means, nrow = n, byrow = FALSE): data length
## [7] is not a sub-multiple or multiple of the number of rows [8300]
Input.means.matrix <- matrix(data=1, nrow=n) %*% cbind(a,b,c,d,e,f,g) 
#centering means
diff.matrix <- AssignmentData - Input.means.matrix
Manual.Covariance.Matrix <- (t(diff.matrix)%*% diff.matrix)*((n-1)^-1)
Manual.Covariance.Matrix
##             USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## USGG3M   11.760393 11.855287 12.303031 11.942035 11.188856  9.924865
## USGG6M   11.855287 12.000510 12.512434 12.158422 11.406959 10.128890
## USGG2YR  12.303031 12.512434 13.284203 12.977542 12.279514 11.005377
## USGG3YR  11.942035 12.158422 12.977542 12.708647 12.068078 10.856033
## USGG5YR  11.188856 11.406959 12.279514 12.068078 11.543082 10.463386
## USGG10YR  9.924865 10.128890 11.005377 10.856033 10.463386  9.583483
## USGG30YR  8.587987  8.768702  9.600181  9.497246  9.212159  8.510632
##          USGG30YR
## USGG3M   8.587987
## USGG6M   8.768702
## USGG2YR  9.600181
## USGG3YR  9.497246
## USGG5YR  9.212159
## USGG10YR 8.510632
## USGG30YR 7.624304
Covariance.Matrix
##             USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## USGG3M   11.760393 11.855287 12.303031 11.942035 11.188856  9.924865
## USGG6M   11.855287 12.000510 12.512434 12.158422 11.406959 10.128890
## USGG2YR  12.303031 12.512434 13.284203 12.977542 12.279514 11.005377
## USGG3YR  11.942035 12.158422 12.977542 12.708647 12.068078 10.856033
## USGG5YR  11.188856 11.406959 12.279514 12.068078 11.543082 10.463386
## USGG10YR  9.924865 10.128890 11.005377 10.856033 10.463386  9.583483
## USGG30YR  8.587987  8.768702  9.600181  9.497246  9.212159  8.510632
##          USGG30YR
## USGG3M   8.587987
## USGG6M   8.768702
## USGG2YR  9.600181
## USGG3YR  9.497246
## USGG5YR  9.212159
## USGG10YR 8.510632
## USGG30YR 7.624304

Plot the covariance matrix.

Maturities<-c(.25,.5,2,3,5,10,30)
contour(Maturities,Maturities,Covariance.Matrix)

Perform the PCA by manually calculating factors, loadings and analyzing the importance of factors.

Find eigenvalues and eigenvectors. Calculate vector of means (zero loading), first 3 loadings and 3 factors.

Eigen.Decomposition.values <- eigen(Manual.Covariance.Matrix)$values
Eigen.Decomposition.vectors <- eigen(Manual.Covariance.Matrix)$vectors
Means <- apply(AssignmentData, 2, function(x) mean(x))
Principle.Components <- diff.matrix %*% Eigen.Decomposition.vectors

Loadings <- Eigen.Decomposition.vectors[,1:3]
Factors <- Principle.Components[,1:3]

barplot(Eigen.Decomposition.values/sum(Eigen.Decomposition.values),width=2,ylim = c(0,1),col = "black",
        names.arg=c("F1","F2","F3","F4","F5","F6","F7"))

Eigen.Decomposition.values/sum(Eigen.Decomposition.values)
## [1] 9.783429e-01 1.976344e-02 1.558895e-03 1.803130e-04 1.059986e-04
## [6] 2.864693e-05 1.981838e-05

Plot the loadings.

Maturities <- c(.25,.5,2,3,5,10,30)
matplot(Maturities, Loadings,type="l",lty=1,col=c("black","red","green"),lwd=3)

Analysis

Interpret the factors by looking at the shapes of the loadings.

Based on the shapes of the bar graph showing the breakdown of the variation among all the seven components, Factor 1 is the component that owns the highest proportion of the variance. In other words, Factor 1 is responsible for 97.8% of the variation in the data. In addition, the next highest component is Factor at 1.98% of the data. These two components combined would be responsible for the variation of the data. The second graph shows a line graph of the top three factor loadings against maturities (input variables). The factor loading values will be in the range between -1.00 to +1.00 because it describes the correlation between the factor against the input variable. For example, the firts three factors has loadings (or a correlation) of -0.38 (black line), -0.50 (red line), and -0.53 (green line) against the USGG3M variable. The closer the loading values are to 1.00 or -1.00, the more correlated they are to that specific input variable.

Calculate and plot 3 selected factors

Factors <- Principle.Components[,1:3]
matplot(Factors,type="l",col=c("black","red","green"),lty=1,lwd=3)

Change the signs of the first factor and the corresponding factor loading.

Loadings[,1] <- -Loadings[,1]
Factors[,1] <- -Factors[,1]
matplot(Factors,type="l",col=c("black","red","green"),lty=1,lwd=3)

matplot(Maturities,Loadings,type="l",lty=1,col=c("black","red","green"),lwd=3)

plot(Factors[,1],Factors[,2],type="l",lwd=2)

Analysis

Draw at least three conclusions from the plot of the first two factors above.

After analyzing the plot above, I am now able to make the following conclusions regarding the first two factors and the interactions between them over time. It is important to know that Factor 2 was generally stable throughout time, while Factor 1 increased steadily throughout time. First, there is a dense, positive slope on the left side of the plot suggesting a positive relationship between Factor 1 and Factor 2. This positive correlation would have occurred some time in the 1980s when Factor 1 increased at a furious rate. Second, the middle part of the plot that is condensed and clumped together suggests a time period where Factor 1 and Factor 2 were relatively stable with little volatility. This could be deduced as the time period late 80s/early 90s. Finally, the right side of the graph is a period of high volatility, where a rise in Factor 1 would come with a fall in Factor 2 (and vice versa).

Analyze the adjustments that each factor makes to the term curve.

OldCurve <- AssignmentData[135,]
NewCurve <- AssignmentData[136,]
CurveChange <- NewCurve-OldCurve

FactorsChange <- Factors[136,]-Factors[135,]

ModelCurveAdjustment.1Factor <- OldCurve+t(Loadings[,1])*FactorsChange[1]
ModelCurveAdjustment.2Factors <- OldCurve+t(Loadings[,1])*FactorsChange[1]+t(Loadings[,2])*FactorsChange[2]
ModelCurveAdjustment.3Factors <- OldCurve+t(Loadings[,1])*FactorsChange[1]+t(Loadings[,2])*FactorsChange[2]+
  t(Loadings[,3])*FactorsChange[3]

matplot(Maturities,
        t(rbind(OldCurve,NewCurve,ModelCurveAdjustment.1Factor,ModelCurveAdjustment.2Factors,
                ModelCurveAdjustment.3Factors)),
        type="l",lty=c(1,1,2,2,2),col=c("black","red","green","blue","magenta"),lwd=3,ylab="Curve Adjustment")
legend(x="topright",c("Old Curve","New Curve","1-Factor Adj.","2-Factor Adj.",
                      "3-Factor Adj."),lty=c(1,1,2,2,2),lwd=3,col=c("black","red","green","blue","magenta"))

rbind(CurveChange,ModelCurveAdjustment.3Factors-OldCurve)
##               USGG3M   USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## CurveChange 1.070000 1.070000 0.8900000 0.8300000 0.7200000 0.5000000
##             1.090063 1.041267 0.9046108 0.8248257 0.6979317 0.5531734
##              USGG30YR
## CurveChange 0.4700000
##             0.4357793

Analysis

Explain how shapes of the loadings affect the adjustnents using only factor 1, factors 1 and 2, and all 3 factors.

After analyzing the line graph above describing how loadings affect the adjustments, the patter seems to be that the more factors you add, the closer the adjustment is to the actual line. More specifically, it took both Factor 1 and Factor 2 for the curve to adjust to a line that fits. Adding in the Factor 3 did not add much significant change to the adjustment.

See the goodness of fit for the example of 10Y yield.

# How close is the approximation for each maturity?
# 5Y
cbind(Maturities,Loadings)
##      Maturities                                 
## [1,]       0.25 0.3839609 -0.50744508  0.5298222
## [2,]       0.50 0.3901870 -0.43946144  0.1114737
## [3,]       2.00 0.4151851 -0.11112721 -0.4187873
## [4,]       3.00 0.4063541  0.01696988 -0.4476561
## [5,]       5.00 0.3860610  0.23140317 -0.2462364
## [6,]      10.00 0.3477544  0.43245979  0.1500903
## [7,]      30.00 0.3047124  0.54421228  0.4979195
Model.10Y<-Loadings[6,1]*Factors[,1]+Loadings[6,2]*Factors[,2]+Loadings[6,3]*Factors[,3]
matplot(cbind(AssignmentData[,6],Model.10Y),type="l",lty=1,lwd=c(3,1),col=c("black","red"),ylab="5Y Yield")

Repeat the PCA using princomp.

# Do PCA analysis using princomp()
PCA.Yields<-princomp(AssignmentData)
names(PCA.Yields)
## [1] "sdev"     "loadings" "center"   "scale"    "n.obs"    "scores"  
## [7] "call"

Compare the loadings.

# Check that the loadings are the same
cbind(PCA.Yields$loadings[,1:3],Maturities,Eigen.Decomposition.vectors[,1:3])
##              Comp.1      Comp.2     Comp.3 Maturities           
## USGG3M   -0.3839609  0.50744508  0.5298222       0.25 -0.3839609
## USGG6M   -0.3901870  0.43946144  0.1114737       0.50 -0.3901870
## USGG2YR  -0.4151851  0.11112721 -0.4187873       2.00 -0.4151851
## USGG3YR  -0.4063541 -0.01696988 -0.4476561       3.00 -0.4063541
## USGG5YR  -0.3860610 -0.23140317 -0.2462364       5.00 -0.3860610
## USGG10YR -0.3477544 -0.43245979  0.1500903      10.00 -0.3477544
## USGG30YR -0.3047124 -0.54421228  0.4979195      30.00 -0.3047124
##                                
## USGG3M   -0.50744508  0.5298222
## USGG6M   -0.43946144  0.1114737
## USGG2YR  -0.11112721 -0.4187873
## USGG3YR   0.01696988 -0.4476561
## USGG5YR   0.23140317 -0.2462364
## USGG10YR  0.43245979  0.1500903
## USGG30YR  0.54421228  0.4979195
matplot(Maturities,PCA.Yields$loadings[,1:3],type="l",col=c("black","red","green"),lty=1,lwd=3)

matplot(PCA.Yields$scores[,1:3],type="l",col=c("black","red","green"),lwd=3,lty=1)

Change the signs of the first factor and factor loading again.

PCA.Yields$loadings[,1]<--PCA.Yields$loadings[,1]
PCA.Yields$scores[,1]<--PCA.Yields$scores[,1]
matplot(Maturities,PCA.Yields$loadings[,1:3],type="l",col=c("black","red","green"),lty=1,lwd=3)

matplot(PCA.Yields$scores[,1:3],type="l",col=c("black","red","green"),lwd=3,lty=1)

Uncover the mystery of the Output in column 8.

# What variable we had as Output?
matplot(cbind(PCA.Yields$scores[,1],AssignmentData.Output,Factors[,1]),type="l",col=c("black","red","green"),lwd=c(3,2,1),lty=c(1,2,3),ylab="Factor 1")

Compare the regression coefficients from Step 2 and Step 3 with factor loadings. First, look at the slopes for AssignmentData.Input~AssignmentData.Output

t(apply(AssignmentData, 2, function(AssignmentData.col) lm(AssignmentData.col~AssignmentData.Output)$coef))
##          (Intercept) AssignmentData.Output
## USGG3M      4.675134             0.3839609
## USGG6M      4.844370             0.3901870
## USGG2YR     5.438888             0.4151851
## USGG3YR     5.644458             0.4063541
## USGG5YR     6.009421             0.3860610
## USGG10YR    6.481316             0.3477544
## USGG30YR    6.869355             0.3047124
cbind(PCA.Yields$center,PCA.Yields$loadings[,1])
##              [,1]      [,2]
## USGG3M   4.675134 0.3839609
## USGG6M   4.844370 0.3901870
## USGG2YR  5.438888 0.4151851
## USGG3YR  5.644458 0.4063541
## USGG5YR  6.009421 0.3860610
## USGG10YR 6.481316 0.3477544
## USGG30YR 6.869355 0.3047124

This shows that the zero loading equals the vector of intercepts of models Y~Output1, where Y is one of the columns of yields in the data. Also, the slopes of the same models are equal to the first loading.

Check if the same is true in the opposite direction: is there a correspondence between the coefficients of models Output1~Yield and the first loading.

AssignmentData.Centered<-t(apply(AssignmentData,1,function(AssignmentData.row) AssignmentData.row-PCA.Yields$center))
dim(AssignmentData.Centered)
## [1] 8300    7
t(apply(AssignmentData.Centered, 2, function(AssignmentData.col) lm(AssignmentData.Output~AssignmentData.col)$coef))
##           (Intercept) AssignmentData.col
## USGG3M   1.420077e-11           2.507561
## USGG6M   1.421187e-11           2.497235
## USGG2YR  1.419747e-11           2.400449
## USGG3YR  1.419989e-11           2.455793
## USGG5YR  1.419549e-11           2.568742
## USGG10YR 1.420297e-11           2.786991
## USGG30YR 1.420965e-11           3.069561

To recover the loading of the first factor by doing regression, use all inputs together.

t(lm(AssignmentData.Output~AssignmentData.Centered)$coef)[-1]
## [1] 0.3839609 0.3901870 0.4151851 0.4063541 0.3860610 0.3477544 0.3047124
PCA.Yields$loadings[,1]
##    USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR  USGG30YR 
## 0.3839609 0.3901870 0.4151851 0.4063541 0.3860610 0.3477544 0.3047124

This means that the factor is a portfolio of all input variables with weights.

PCA.Yields$loadings[,1]
##    USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR  USGG30YR 
## 0.3839609 0.3901870 0.4151851 0.4063541 0.3860610 0.3477544 0.3047124